Sometimes, instead of data about isolated units (humans, usually) and their attributes/outcomes, we have relational data about the connections between units.
Networks are data structures for representing relational information.
Some people think networks are transcendental data structures that can reveal many hidden truths about the human condition. I don’t, but I do think networks are very useful.
But there are a lot of pitfalls in analyzing network data. Many data analysis heuristics and existing statistical tools do not work for network data, and may give misleading results.
Mathematicians call networks graphs. A graph consists of: - a set \(V\) of vertices (nodes, subjects, units, egos) - a set \(E\) of edges (links, arcs, connections, relationships) \[ E = \{ \{i,j\} \text{ for } i,j\in V\} \]
\[V = \{1,2,3,4,5,6,7,8\}\] \[E = \{ \{1,2\}, \{2,3\}, \{2,4\}, \{2,7\}, \{3,7\}, \{4,5\}, \{5,7\}, \{5,6\}, \{6,8\}, \{7,8\} \}\]
Network (\(G\)) & Vertices (\(V\)) & Edges (\(E\))
| Social | people | friendships |
|---|---|---|
| Needle | needles | people |
| Publication | articles | citations |
| Publication | authors | co-authorship |
| Phylogenetic | species | ancestral relationship |
| Trade | countries | trade relationships |
| Gene expression | genes | interactions |
| Brain | cortical regions | fiber tracts |
| Neural | neurons | axons |
| Injection | people | needle sharing partners |
Types of networks: directed/undirected
An undirected network has symmetric links
A directed network has directed links
Edges in an undirected graph are usually denoted as ordered pairs, e.g. \(E=\{ (i,j):\ i \to j\}\).
In a weighted network, each edge \(\{i,j\}\in E\) has a numeric attribute \(z_{ij}\).
Sometimes we define \(z_{ij}=0\) if \(\{i,j\}\notin E\).
Graphs can be
There are lots of other types of graphs!
The degree of a vertex is the number of edges incident to it: \[d_i = |\{j\in V:\ \{i,j\}\in E\}|\]
Vertex 7 has degree \(d_7=4\). Sometimes social scientists call this the egocentric network size or just network size.
For directed graphs, we distinguish between
The degree sequence of a graph is the sequence (sometimes ordered) of vertex degrees.
Degree sequence: \(\mathbf{d}=(1,4,2,2,3,2,4,2)\)
Does the degree sequence \(\mathbf{d}\) uniquely determine the topology of \(G\)?
The degree distribution is the frequency of each degree value in a graph.
Consider a graph \(G=(V,E)\).
Adjacency matrix: \(A_{ij} = 1\) if and only if \(\{i,j\}\in E\).
In general, using the adjacency matrix representation of a graph lets you do linear algebra.
Simple properties:
Spectral properties
Visualizing a network requires a layout algorithm that tells your computer where to put each of the vertices relative to each other in two dimensions.
Layout algorithms try to place vertices in 2 dimensions so that:
I recommend:
www.hiveplot.com
Hu, Lu, Wu, “Spectrum-Based Network Visualization for Topology Analysis”, IEEE Computer Graphics and Applications 33, pages 58-68, 2013.
For a vertex \(i\), define - degree centrality: \(d_i\) - closeness centrality: \[ \frac{1}{\sum_{j\in V} \text{shortestPathDist}(i,j)} \] - betweenness centrality: \[ \sum_{j\neq k\neq i} \frac{\#\text{shortestPaths}(j\to i \to k)}{\#\text{shortestPaths}(j\to k)} \] - eigenvector centrality: \(x_i\) where \[ Ax = \lambda x\] defines the eigenvector \(x\) and its corresponding (largest) eigenvalue \(\lambda\). There are many other measures of centrality.
There are two main ways networks are divided into clusters/communities: - Hierarchical clustering iteratively combines (divides) vertices into sets based on their connectivity.
- Spectral clustering exploits algebraic (eigendecomposition) properties of the adjacency matrix \(A\) or the graph Laplacian \(L=D-A\).
Vertices and edges can have attributes, or observations associated with them.
- Let \(x_i\) be attributes of vertex \(i\in V\). - Let \(z_{ij}\) be attributes of the edge \(\{i,j\}\).
We might want to summarize node traits in the network by - degree assortativity: connections to nodes with similar degree - trait assortativity (homophily): connections to nodes with similar traits
Why do we need models for networks? Networks are complicated, and there are lots of them. By constructing models, learn about the structure of a complicated object using a small number of parameters.
One reason we need models is that there are a lot of networks:
Example For a graph \(G=(V,E)\) with \(n = |V| = 50\), there are approximately \(5.77\times 10^{368}\) possible undirected graphs and \(3.34\times 10^{737}\) directed graphs. There are about \(10^{80}\) atoms in the known universe.
This means that it can be prohibitively difficult to enumerate them for even modest \(n\).
So it is helpful to have a model with fewer parameters than \(2^{\binom{n}{2}}\) that helps us summarize networks.
A network model is a set of networks \(\mathcal{G}\) and a probability function \(\Pr(\cdot)\ge 0\) that maps networks to probabilities such that \[ \sum_{G\in\mathcal{G}} \Pr(G) = 1. \]
Probabilistic network models and parameter estimation: If you have a probability model \(\Pr(\cdot|\theta)\), where \(\theta\) is a set of parameters, and you observe a network \(G\), you might want to find the value of \(\theta\) that maximizes the probability of the network. In other words, \[ \hat\theta = \arg\max_{\theta} \Pr(G|\theta) \] This is called “maximum likelihood”.
There are many other ways to estimate parameters. %But the basic idea is to use an observed network \((G,X,Z)\) to tell you something about the model \(\Pr(G,X,Z|\theta)\) that might have generated that network.
Consider a graph of \(n=|V|\) vertices in which each edge exists independently with probability \(p\). Then the probability of observing a particular graph \(G=(V,E)\) is \[ \Pr(G|p) = p^{|E|} (1-p)^{\binom{n}{2} - |E|} \] Many analytic results can be derived for the ER model:
Also known as: Bernoulli model, homogeneous random graph
Is the ER model a good model for human social networks? Does the Erdos-Renyi model produce community structure?
A simple model for graph edges, given vertex attributes, is \[ \log\left[\frac{\Pr(i \sim j)}{\Pr(i\nsim j)}\right] = \alpha + \beta f(x_i,x_j) \] where \(\alpha\) is an intercept and \(f(x_i,x_j)\) is some (symmetric) function:
You should use this model to explain the formation of a network if:
An exponential random graph model (ERGM) has probability \[ \Pr(G|\theta) = \frac{\exp[\theta'h(G)]}{\kappa(\theta)} \] where \(h(G)\) is some function of \(G\) (called the ), and \[ \kappa(\theta) = \sum_{g\in \mathcal{G}} \exp[\theta'h(g)] \] is a normalizing constant that ensures these probabilities sum to 1.
Good things:
Examples of ERGM sufficient statistics
We observe \(G\), we want to fit the ERGM \(\Pr(G|\theta)\) to learn about \(\theta\). \[ \ell(\theta) = \log\left[ \Pr(G|\theta)\right] = \theta'h(G) - \log\kappa(\theta) \] Taking the derivative (gradient) with respect to \(\theta\) (and then doing some algebra) yields the estimating equation \[ \nabla_{\!\theta} \ell(\theta) = h(G) - \text{E}_\theta[h(G)] = 0 \]
This suggests a stochastic algorithm for learning about \(\theta\):
Model degeneracy occurs:
Rinaldo, Fienberg, Zhou. On the geometry of discrete exponential families with application to exponential random graph models. 3 (2009): 446-484.
Let \(h(G) = (\text{\#edges}, \text{\#triangles})\). Let \(\Pr(G|\theta) \propto \exp[\theta'h(G)]\).
Rinaldo, Fienberg, Zhou. On the geometry of discrete exponential families with application to exponential random graph models. Electronic Journal of Statistics 3 (2009): 446-484.
Exchangeable graphs have the property that their distribution is invariant to permutation of the vertex labels.
The stochastic block model (SBM) is a graph in which each vertex \(i\in V\) has a categorical attribute \(x_i\in\{1,\ldots,K\}\). The attribute \(x_i\) is called the “block” membership of \(i\).
Define a \(K\times K\) matrix \[ P = \begin{pmatrix} p_{11} & p_{12} & \cdots & p_{1K} \\ p_{21} & p_{22} & \cdots & p_{2K} \\ \vdots & \vdots & \ddots & \vdots \\ p_{K1} & p_{K2} & \cdots & p_{KK} \\ \end{pmatrix} \] that defines the edge probabilities between each of the \(K\) “blocks”.
An edge between \(i\) and \(j\) is formed with probability \(P_{x_i,x_j}\).
See also: degree-corrected SBM.
There are lots of other network models designed to capture important features of real-world networks.
One of the main features is a “heavy-tailed” degree distribution in which most vertices have small degree, and some have very large degree.
Further reading:
Dynamic or stochastic processes on networks are beyond the scope of this talk. However, you might be interested in further reading on:
First, think about the data. For example:
Next,
Beware the sub-sampled network!
Suppose \(G=(V,E)\) is a graph from a graph model \(\Pr(G)\). Suppose you obtain a randomly chosen vertex subset \(V_S\subset V\). You observe the _induced subgraph} \(G_S=(V_S,E_S)\), where \(\{i,j\}\in E_S\) if and only if \(i,j\in V_S\) and \(\{i,j\}\in E\).
Then in general \(G_S\) does _not} follow the graph model \(\Pr(G)\).
So, you cannot generalize findings from \(G_S\) to \(G\). In particular, sub-samples destroy triangles and other higher-order structure, so inferences about the topology of \(G_S\) are not representative of the topology of \(G\).
Exception: exchangeable models like ER and SBM.
See also: Chapter 5 of Kolaczyk, Statistical Analysis of Network Data.
Suppose vertex attributes have an effect on both edge existence and vertex outcome. Then conditioning on the network \(A\) induces non-causal association between attributes and outcomes.
This is also called “collider” or “selection”" bias. Be extra careful about ascribing a causal interpretation to “spillover” estimates in networks.
The standard network regression for a binary vertex outcome \(y_i\) is usually: \[ \log\left(\frac{\Pr(Y_i=1)}{\Pr(Y_i=0)}\right) = \alpha + x_i'\beta + \frac{\gamma}{|N_i|} \sum_{j\in N_i} y_j \]
where \(\alpha\) (intercept); \(\beta\) (individual effect); \(N_i\) (neighbors of \(i\)); \(\gamma\) (peer/spillover effect)
You can use it if you agree with all of these statements:
Estimating \(\gamma\neq 0\) does not imply contagion, spillover, or “peer effects”.
Consider the linear model for the mean outcome \[ \text{E}[Y_i] = \alpha + \beta'x_i + \frac{\gamma}{|N_i|} \sum_{j\in N_i} (y_j-\alpha-\beta'x_j) \] with \[ \text{Var}[Y_i] = \sigma^2 \] This defines a Normal (Gaussian) linear model. It produces a Markov random field and yields a valid joint distribution.
Estimating \(\gamma\neq 0\) does not imply contagion, spillover, or “peer effects”.
General advice:
When subjects are independent, researchers sometimes _resample} outcomes and compute the distribution of an estimator under resampling.
_Bootstrap standard errors} for independent units that do not depend on asymptotic normality can be computed in this way.
Suppose you have a statistic (estimator) \(T(G)\).
Q: Can you resample vertices in \(G\) to get valid bootstrap standard errors? A: Basically never.
However, some classes of random graph model (exchangeable) have a distribution that is invariant to sampling subgraphs.
In no particular order:
books
[Back: Agent-based Models] [Home] [Next: Network models]